Track bundle resource state sizes in telemetry (direct engine)#5199
Track bundle resource state sizes in telemetry (direct engine)#5199shreyas-goenka wants to merge 1 commit into
Conversation
4a3106b to
715f019
Compare
715f019 to
77ec7bc
Compare
77ec7bc to
85dd380
Compare
85dd380 to
65bb595
Compare
65bb595 to
44e4bc0
Compare
Approval status: pending
|
|
Commit: 1cab854 |
| db := dstate.NewDatabase("", 0) | ||
|
|
||
| pattern := dyn.NewPattern(dyn.Key("resources"), dyn.AnyKey(), dyn.AnyKey()) | ||
| _, err := dyn.MapByPattern(b.Config.Value(), pattern, func(p dyn.Path, v dyn.Value) (dyn.Value, error) { |
There was a problem hiding this comment.
Why walk the config and not the actual state? They might not match 1-1.
There was a problem hiding this comment.
To capture the state for both terraform and direct deployments. Most customers are still on terraform so this givess us approximate stats for the state sizes.
There was a problem hiding this comment.
I see, in that case we should not call StateFileSize, since it has nothing to do with it, we should call it ConfigFileSize.
There was a problem hiding this comment.
It does try to approximate the state size - by calling PrepareState:
target := cfg
if adapter, ok := adapters[resourceType]; ok {
state, err := adapter.PrepareState(cfg)
if err != nil {
return nil, fmt.Errorf("prepare state: %w", err)
}
target = state
}
// dstate.SaveState writes resource state with MarshalIndent using these
// exact prefix/indent arguments; matching them here means each resource's
// byte length equals len(entry.State) on disk for direct deploys.
raw, err := json.MarshalIndent(target, " ", " ")
if err != nil {
return nil, fmt.Errorf("marshal: %w", err)
}
return raw, nil
|
|
||
| var fileSize int64 | ||
| if len(db.State) > 0 { | ||
| raw, mErr := json.MarshalIndent(db, "", " ") |
There was a problem hiding this comment.
why not os.Stat actual state file to get the size?
| // dstate.SaveState writes resource state with MarshalIndent using these | ||
| // exact prefix/indent arguments; matching them here means each resource's | ||
| // byte length equals len(entry.State) on disk for direct deploys. | ||
| raw, err := json.MarshalIndent(target, " ", " ") |
There was a problem hiding this comment.
this will add some overhead to large bundles (e.g. python-generated one with a lot of jobs).
44e4bc0 to
d16c208
Compare
Adds a `resources_metadata` field to the bundle deploy telemetry event with, per resource type, the count and the max/mean/median state size in bytes, plus the whole state file size. Only direct deploys are measured, and collection does no marshalling, file read, or JSON parsing of its own. The direct engine already serializes each resource's state during the deploy and reconstructs it via WAL replay in Finalize; ExportStateFromData now records each entry's len(state) on the ResourceState it returns. deployCore stashes that finalized state on b.Metrics, and telemetry reads the per-resource sizes straight off the in-memory map. The whole-file size comes from a single os.Stat (no read/parse). Terraform stores state differently and is not collected (the field is absent there). Because the metadata is direct-only it diverges across the DATABRICKS_BUNDLE_ENGINE test matrix, so the shared telemetry/deploy golden omits it; the logic is covered by unit tests. The universe proto (resources_metadata, BundleResourcesMetadata, ResourceMetadata) is already merged, so this is ingested rather than dropped. Co-authored-by: Isaac
d16c208 to
1cab854
Compare
Adds a
resources_metadatafield to the bundle deploy telemetry event with, per resource type, thecountand the max/mean/median state size in bytes, plus the whole state file size.Only direct deploys are measured, and collection does no marshalling, file read, or JSON parsing of its own. The direct engine already serializes each resource's state during the deploy and reconstructs it via WAL replay in
Finalize;ExportStateFromDatanow records each entry'slen(state)on theResourceStateit returns.deployCorestashes that finalized state onb.Metrics, and telemetry reads the per-resource sizes straight off the in-memory map. The whole-file size comes from a singleos.Stat(no read/parse). Terraform stores state differently and is not collected (the field is absent there).Because the metadata is direct-only it diverges across the
DATABRICKS_BUNDLE_ENGINEtest matrix, so the sharedtelemetry/deploygolden omits it; the logic is covered by unit tests.The universe proto (
resources_metadata,BundleResourcesMetadata,ResourceMetadata) is already merged, so this is ingested rather than dropped.This pull request and its description were written by Isaac.